class: center, middle, inverse, title-slide # Normal Distributions and Rescaling ### S. Mason Garrison --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Normal Distribution --- ## Normal Distribution Def: a particular bell-shaped curve that has the following mathematical properties `\(f(x)= \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}}\)` - Formula has two parameters - `\(\mu\)` - `\(\sigma\)` - The standard normal `\((\mu=0; \sigma=1)\)` simplifies the equation --- # Standard Normal with multiple means - The mean is located at the center of the symmetric curve and is the same as the median. - Changing `\(\mu\)` without changing `\(\sigma\)` moves the Normal curve along the horizontal axis without changing its variability. .small[ <img src="data:image/png;base64,#rnorms_files/figure-html/norm-1.png" width="55%" style="display: block; margin: auto;" /> ] --- .small[ ```r #### Normal Distribution # Display the normal distributions with various means x <- seq(-80, 80, length=1000) hx <- dnorm(x) colors <- c("red", "blue","green", "green", "gold", "black") plot(x, hx, type="l", lty=2, xlab="x value", ylab="Density", main="Comparison of Normal Distributions",xlim=c(-5, 7)) location<-c(2,4,-2) for (i in 1:3){ lines(x, dnorm(x,mean=location[i]), lwd=1, col=colors[i]) } ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-2-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # Plots of the Standard Normal with multiple standard deviations .pull-left[ - The standard deviation `\(\sigma\)` controls the variability of a Normal curve. -When the standard deviation is larger, the area under the normal curve is less concentrated about the mean. - The standard deviation is the distance from the center to the change-of-curvature points on either side. ] .pull-right.small[ <img src="data:image/png;base64,#rnorms_files/figure-html/zsd-1.png" width="90%" style="display: block; margin: auto;" /> ] --- .small[ ```r # Display the normal distributions with various standard deviations plot(x, hx, type="l", lty=2, xlab="x value", ylab="Density", main="Comparison of normal Distributions",xlim=c(-10, 10)) for (i in c(.5,2,4,6)){ lines(x, dnorm(x,sd=i), lwd=1, col=colors[i]) } ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-3-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # Normal Distribution .pull-left[ - In the Normal distribution, with mean `\(\mu\)` and standard deviation `\(\sigma\)`: - approximately 68\% of the observations fall within 1 `\(\sigma\)` of `\(\mu\)` - approximately 95\% of the observations fall within 2 `\(\sigma\)` of `\(\mu\)` - approximately 99.7\% of the observations fall within 3 `\(\sigma\)` of `\(\mu\)` - This property is sometimes called: The 68-95-99.7 Rule ] .pull-right[ <img src="data:image/png;base64,#../img/normal.png" width="95%" style="display: block; margin: auto;" /> ] --- <img src="data:image/png;base64,#../img/normal.png" width="70%" style="display: block; margin: auto;" /> --- # Worked Example .pull-left[ - The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for seventh-grade students in Gary, Indiana, is close to Normal. - Suppose the distribution is N(6.84, 1.55). ] -- .pull-right[ - Sketch the Normal density curve for this distribution. - What percent of ITBS scores is between 3.74 and 9.94? - What percent of the scores is above 5.29? ] -- <img src="data:image/png;base64,#../img/norms.png" width="55%" style="display: block; margin: auto;" /> --- <br> <img src="data:image/png;base64,#../img/norms.png" width="100%" style="display: block; margin: auto;" /> --- # Standard Normal - Normal is a model of the real world - Not exact, but it is a facile model for many things - Physical features - Psychological features - Performance measures -- - Not all variables are normal - Skewed variables (e.g. income) - Any count variable (number of kids, mistakes on an exam) --- # Real World Data .pull-left[ - Many variables follow this distribution ( but not all) - I have plotted histograms of data, - we have already used in this class - overlaid with the standard normal. ] -- .small.pull-right[ <img src="data:image/png;base64,#rnorms_files/figure-html/example-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Height of Children (Galton dataset) .small[ ```r library(HistData) library(ggplot2) ggplot(Galton, aes(x = child)) + geom_histogram(fill = "red") + stat_function( fun = function(x, mean, sd, n){ n * dnorm(x = x, mean = mean, sd = sd) }, args = with(Galton, c(mean = mean(child), sd = sd(child), n = length(child))) ) + scale_x_continuous("Heights of Children") ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-8-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # IMBD movie ratings (movies dataset) .pull-left.midi[ ```r library(ggplot2movies) data(movies) plotted_dataset=movies plotted_dataset$variablex=movies$rating variablex=movies$rating ggmovie=ggplot(plotted_dataset, aes(x =variablex)) + geom_histogram(fill = "blue") + geom_freqpoly(aes( x=rnorm(length(variablex))*sd(variablex)+mean(variablex)), fill = "black") + scale_x_continuous("IMBD Movie Ratings") ``` ] .pull-right[ <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Temperature in Nottingham (nottem dataset) <img src="data:image/png;base64,#rnorms_files/figure-html/nottem-1.png" width="80%" style="display: block; margin: auto;" /> --- # Standard Normal - Normal distribution tricks - Symmetric - 50% of area above zero - Total proportion is 1.0 (or 100%) --- # Area under the Normal Distribution <img src="data:image/png;base64,#../img/normal.png" width="60%" style="display: block; margin: auto;" /> --- class: middle # Wrapping Up... --- class: middle # Rescaling --- # Rescaling - All Normal distributions are the same - if we measure in units of size `\(\sigma\)` from the mean `\(\mu\)` as center. - We can convert any variable into the same metric as the standard normal - Changing to these units is called standardizing or rescaling. --- # Converting Formulas .pull-left[ - Statistics Sample - `\(z_{i}\)` = `\(\frac{x_{i}-\bar{x}}{s}\)` ] -- .pull-right[ - Population - `\(z_{i}\)` = `\(\frac{x_{i}-\mu}{\sigma}\)` ] --- # Merits .pull-left[ - Advantages - Allows us to compare scores on a common metric - Origin is 0. The mean - The units are 1, the standard deviation - '+' values above the mean - '-' values below the mean ] -- .pull-right[ - We can compare across measurement scales - Shape of the distribution does NOT CHANGE - We can go from z-scores to raw scores ] --- # Demo ```r library(ggplot2movies) # Rescaling variable<-movies$rating scale(variable)[1:10] ``` ``` ## [1] 0.30079877 0.04323788 1.45982279 1.45982279 -1.63090793 ## [6] -1.05139592 -0.40749368 0.49396944 0.42957922 0.04323788 ``` --- # Demo .pull-left[ ```r plot(density(variable)) # no scaling ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-12-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ ```r plot(density(scale(variable))) # with scaling ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Worked Z-Score Problem - Here are the IQ test scores of 31 7th-grade girls in a Midwest school district. <br> <img src="data:image/png;base64,#../img/iqz.png" width="70%" style="display: block; margin: auto;" /> --- # Worked Z-Score Problem A) We expect IQ scores to be approximately Normal. -- - Make a stem plot to check that there are no major departures from normality. -- <img src="data:image/png;base64,#../img/iqstem.png" width="65%" style="display: block; margin: auto;" /> --- # Worked Z-Score Problem B) Find the mean and standard deviation -- - Mean =105.84 = `\(\sum \frac{X_{i}}{n}\)` = 3281/31 - SD = 14.27 = `\(s^{2}\)` = `\(\frac{\sum^{n}_{i=1}(x_{i}-\bar{x})^{2}}{n-1}\)` = `\(s^{2}\)` = `\(\frac{\sum^{n}_{i=1}(x_{i}-105.84)^{2}}{30}\)` --- # Worked Z-Score Problem C) What proportion of scores are within one standard deviation of the mean? - One SD above mean = 105.84+14.27 = 120.11 - One SD below mean = 105.84 -14.27 = 91.57 - 23/31 = .74 -- <img src="data:image/png;base64,#../img/workz.png" width="65%" style="display: block; margin: auto;" /> --- # Worked Z-score Problem B) What proportion of scores are within TWO standard deviations of the mean? - TWO SD above mean = 105.84+2(14.27) = 134.38 - TWO SD below mean = 105.84 - 2(14.27) = 77.3 - 29/31 = .935 --- # Worked Z-score Problem B) What would these proportions be in an exactly Normal distribution? - +/- One SD - Continued in Power Point... <img src="data:image/png;base64,#../img/table.png" width="90%" style="display: block; margin: auto;" /> --- # Wrapping Up...